[SPARK-13139][SQL] Parse Hive DDL commands ourselves #11573

andrewor14 · 2016-03-08T04:53:49Z

What changes were proposed in this pull request?

This patch is ported over from @viirya's changes in #11048. Currently for most DDLs we just pass the query text directly to Hive. Instead, we should parse these commands ourselves and in the future (not part of this patch) use the HiveCatalog to process these DDLs. This is a pretext to merging SQLContext and HiveContext.

Note: As of this patch we still pass the query text to Hive. The difference is that we now parse the commands ourselves so in the future we can just use our own catalog.

How was this patch tested?

Jenkins, new DDLCommandSuite, which comprises of about 40% of the changes here.

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/bucket.scala

andrewor14 · 2016-03-08T04:56:18Z

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

+    buckets: Option[BucketSpec],
+    // TODO: use `clustered` and `sorted` instead for simplicity
+    noClustered: Boolean,
+    noSorted: Boolean)(sql: String)


@viirya was there any reason why these have to be negative? It's much easier to understand if it's positive, i.e. clustered and sorted.

Just because the corresponding token is TOK_NOT_CLUSTERED and TOK_NOT_SORTED. Yea we can use positive here.

andrewor14 · 2016-03-08T05:00:27Z

Note: The only changes I made on top of #11048 is addressing the outstanding comments in that patch and some minor clean ups. It's entirely possible that there still are things that are missing or incorrect given the original patch was not reviewed completely yet.

@hvanhovell @yhuai PTAL.

SparkQA · 2016-03-08T07:11:50Z

Test build #52630 has finished for PR 11573 at commit a663b5c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-03-08T09:04:45Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala

@@ -64,6 +85,53 @@ private[sql] class SparkQl(conf: ParserConf = SimpleParserConf()) extends Cataly
        val tableIdent = extractTableIdent(nameParts)
        RefreshTable(tableIdent)

+      case Token("TOK_CREATEDATABASE", Token(databaseName, Nil) :: createDatabaseArgs) =>
+        val Seq(
+          allowExisting,


Isn't allowExists the exact opposite of IF NOT EXISTS? I am asking because I have a similar case in my PR.

Not quite understand your question?

When i see a parameter that is named allowExists I think the existence of the current table is allowed, and that as a consequence the current create table command is going to overwrite the existing table. Why not name this ifNotExists?

I've renamed it.

This commit cleans up the case class signatures representing ALTER TABLE commands along with the code that parses them. This commit also adds a lot of missing documentation that was sorely needed for AlterTableCommandParser to be readable. The cleanup in this commit concerns only ALTER TABLE commands, but is expected to spread to other parts of the PR in subsequent commits.

andrewor14 · 2016-03-09T08:55:30Z

@hvanhovell I've addressed most of your comments except a couple ones where I said I would fix later. Separately I've also significantly cleaned up the logic and signatures in the alter table code so you should have a look at that as well. It's probably quite different from what it looked like when you last reviewed this patch.

I'll continue the cleanup in the rest of the patch shortly. I'm marking this as WIP in the mean time.

andrewor14 · 2016-03-09T22:43:47Z

@hvanhovell I believe as of the latest commit I've addressed all of your comments. This is ready from my side so I've removed the WIP tag. PTAL.

SparkQA · 2016-03-10T00:39:17Z

Test build #52773 has finished for PR 11573 at commit 6ad8dd5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-03-10T01:09:45Z

Also cc @rxin @yhuai.

SparkQA · 2016-03-10T01:44:31Z

Test build #52770 has finished for PR 11573 at commit 37854fc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-03-11T18:24:04Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/bucket.scala

 */
 private[sql] case class BucketSpec(
    numBuckets: Int,
    bucketColumnNames: Seq[String],
-    sortColumnNames: Seq[String])
+    sortColumns: Seq[(String, SortDirection)]) {


It is great to have SortDirection defined explicitly. So, when we create it we know what will the order by.

However, this change implies that we can use descending as the order, which is actually not allowed because UnsafeKVExternalSorter only sort keys with ascending order (this sorter is used in DynamicPartitionWriterContainer when we generate bucketing files). So, I think for now, it is better to revert this change. In future, we can revisit it when we store sort direction in metastore and we can actually sort rows in a bucket file with a descending order.

This reverts commit 307a588.

yhuai · 2016-03-11T21:06:12Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala

+            }
+          case _ => parseFailed("Invalid CREATE FUNCTION command", node)
+        }.toMap
+        CreateFunction(funcName, alias, resourcesMap, temp.isDefined)(node.source)


What will this map look like if I have USING JAR 'jar1', JAR 'jar2', ...?

Let's also have a test for this case.

yhuai · 2016-03-11T22:01:43Z

I have left a few comments. It is a good starting point. Thank you for working on this!

SparkQA · 2016-03-11T22:50:31Z

Test build #52937 has finished for PR 11573 at commit bd91b0f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2016-03-11T23:12:54Z

OK. Let's merge this first (to avoid it has conflicts caused by other commits ). We can address the comments in a follow-up PR.

Addressing outstanding comments in apache#11573. Jenkins, new test case in `DDLCommandSuite` Author: Andrew Or <[email protected]> Closes apache#11667 from andrewor14/ddl-parser-followups.

## What changes were proposed in this pull request? As part of the effort to merge `SQLContext` and `HiveContext`, this patch implements an internal catalog called `SessionCatalog` that handles temporary functions and tables and delegates metastore operations to `ExternalCatalog`. Currently, this is still dead code, but in the future it will be part of `SessionState` and will replace `o.a.s.sql.catalyst.analysis.Catalog`. A recent patch #11573 parses Hive commands ourselves in Spark, but still passes the entire query text to Hive. In a future patch, we will use `SessionCatalog` to implement the parsed commands. ## How was this patch tested? 800+ lines of tests in `SessionCatalogSuite`. Author: Andrew Or <[email protected]> Closes #11750 from andrewor14/temp-catalog.

Addressing outstanding comments in apache#11573. Jenkins, new test case in `DDLCommandSuite` Author: Andrew Or <[email protected]> Closes apache#11667 from andrewor14/ddl-parser-followups.

## What changes were proposed in this pull request? This patch is ported over from viirya's changes in apache#11048. Currently for most DDLs we just pass the query text directly to Hive. Instead, we should parse these commands ourselves and in the future (not part of this patch) use the `HiveCatalog` to process these DDLs. This is a pretext to merging `SQLContext` and `HiveContext`. Note: As of this patch we still pass the query text to Hive. The difference is that we now parse the commands ourselves so in the future we can just use our own catalog. ## How was this patch tested? Jenkins, new `DDLCommandSuite`, which comprises of about 40% of the changes here. Author: Andrew Or <[email protected]> Closes apache#11573 from andrewor14/parser-plus-plus.

Addressing outstanding comments in apache#11573. Jenkins, new test case in `DDLCommandSuite` Author: Andrew Or <[email protected]> Closes apache#11667 from andrewor14/ddl-parser-followups.

## What changes were proposed in this pull request? As part of the effort to merge `SQLContext` and `HiveContext`, this patch implements an internal catalog called `SessionCatalog` that handles temporary functions and tables and delegates metastore operations to `ExternalCatalog`. Currently, this is still dead code, but in the future it will be part of `SessionState` and will replace `o.a.s.sql.catalyst.analysis.Catalog`. A recent patch apache#11573 parses Hive commands ourselves in Spark, but still passes the entire query text to Hive. In a future patch, we will use `SessionCatalog` to implement the parsed commands. ## How was this patch tested? 800+ lines of tests in `SessionCatalogSuite`. Author: Andrew Or <[email protected]> Closes apache#11750 from andrewor14/temp-catalog.

Andrew Or added 6 commits March 4, 2016 14:35

Move things into new ParserUtils object

fc3c168

Merge branch 'master' of github.com:apache/spark into parser-plus-plus

adcb561

Port over viirya's changes in apache#11048

010afdd

Address comments from apache#11408 + fix style

0079074

Merge branch 'master' of github.com:apache/spark into parser-plus-plus

02de9b7

Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/bucket.scala

Minor fixes

a663b5c

andrewor14 force-pushed the parser-plus-plus branch from 3766f83 to a663b5c Compare March 8, 2016 04:55

andrewor14 reviewed Mar 8, 2016
View reviewed changes

hvanhovell reviewed Mar 8, 2016
View reviewed changes

Andrew Or added 2 commits March 8, 2016 12:39

Merge branch 'master' of github.com:apache/spark into parser-plus-plus

d79963a

Rename notClustered and notSorted

042222d

andrewor14 force-pushed the parser-plus-plus branch from f54501f to 6538829 Compare March 9, 2016 08:49

andrewor14 force-pushed the parser-plus-plus branch from 6538829 to 6c3f994 Compare March 9, 2016 08:52

andrewor14 changed the title ~~[SPARK-13139][SQL] Parse Hive DDL commands ourselves~~ [SPARK-13139][SQL][WIP] Parse Hive DDL commands ourselves Mar 9, 2016

Merge branch 'master' of github.com:apache/spark into parser-plus-plus

37854fc

andrewor14 changed the title ~~[SPARK-13139][SQL][WIP] Parse Hive DDL commands ourselves~~ [SPARK-13139][SQL] Parse Hive DDL commands ourselves Mar 9, 2016

Minor changes

6ad8dd5

yhuai reviewed Mar 11, 2016
View reviewed changes

Andrew Or added 3 commits March 11, 2016 10:31

Merge branch 'master' of github.com:apache/spark into parser-plus-plus

074e3b4

Revert "Group (col_name, sort_dir) in BucketSpec"

7f3281e

This reverts commit 307a588.

Revert sort directions in bucket spec for now

bd91b0f

andrewor14 force-pushed the parser-plus-plus branch from ca64636 to bd91b0f Compare March 11, 2016 19:50

yhuai reviewed Mar 11, 2016
View reviewed changes

asfgit closed this in 66d9d0e Mar 11, 2016

andrewor14 deleted the parser-plus-plus branch March 11, 2016 23:31

andrewor14 mentioned this pull request Mar 12, 2016

[SPARK-13139][SQL] Follow-ups to #11573 #11667

Closed

andrewor14 mentioned this pull request Mar 16, 2016

[SPARK-13923] [SQL] Implement SessionCatalog #11750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13139][SQL] Parse Hive DDL commands ourselves #11573

[SPARK-13139][SQL] Parse Hive DDL commands ourselves #11573

andrewor14 commented Mar 8, 2016

andrewor14 Mar 8, 2016

viirya Mar 8, 2016

andrewor14 commented Mar 8, 2016

SparkQA commented Mar 8, 2016

hvanhovell Mar 8, 2016

viirya Mar 8, 2016

hvanhovell Mar 8, 2016

andrewor14 Mar 9, 2016

andrewor14 commented Mar 9, 2016

andrewor14 commented Mar 9, 2016

SparkQA commented Mar 10, 2016

andrewor14 commented Mar 10, 2016

SparkQA commented Mar 10, 2016

yhuai Mar 11, 2016

yhuai Mar 11, 2016

yhuai Mar 11, 2016

yhuai commented Mar 11, 2016

SparkQA commented Mar 11, 2016

yhuai commented Mar 11, 2016

[SPARK-13139][SQL] Parse Hive DDL commands ourselves #11573

[SPARK-13139][SQL] Parse Hive DDL commands ourselves #11573

Conversation

andrewor14 commented Mar 8, 2016

What changes were proposed in this pull request?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewor14 commented Mar 8, 2016

SparkQA commented Mar 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewor14 commented Mar 9, 2016

andrewor14 commented Mar 9, 2016

SparkQA commented Mar 10, 2016

andrewor14 commented Mar 10, 2016

SparkQA commented Mar 10, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhuai commented Mar 11, 2016

SparkQA commented Mar 11, 2016

yhuai commented Mar 11, 2016